Closed form word embedding alignment

نویسندگان

چکیده

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using mechanisms (e.g., GloVe word2vec). Our methods simple and have closed form optimally rotate, translate, scale minimize root mean squared errors maximize the average cosine similarity between two same vocabulary into dimensional space. extend approaches known as absolute orientation, popular for aligning objects in three dimensions, generalize an approach by Smith et al. (ICLR 2017). prove new results optimal scaling maximizing similarity. Then, we demonstrate how evaluate sources mechanisms, that certain properties like synonyms analogies preserved across can be enhanced simply averaging ensembles embeddings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as clustering similar words and inferring learning relationships, many challenges and open research questions remain. In this paper, we propose a solution that...

متن کامل

Bayesian Neural Word Embedding

Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram (SG) with negative sampling, known also as word2vec, advanced the stateof-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm that can be beneficial to general item similarity tasks as well...

متن کامل

Word to word alignment strategies

Word alignment is a challenging task aiming at the identification of translational relations between words and multi-word units in parallel corpora. Many alignment strategies are based on links between single words. Different strategies can be used to find the optimal word alignment using such one-toone word links including relations between multi-word units. In this paper seven algorithms are ...

متن کامل

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar cooccurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discrim...

متن کامل

Semantic Word Embedding (SWE)

......................................................................................................................... 2 Chapter 1 Semantic Word Embedding ........................................................................ 3 1.1 The Skip-gram moel ..................................................................................... 3 1.2 SWE as Constrained Optimization ....................

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge and Information Systems

سال: 2021

ISSN: ['0219-3116', '0219-1377']

DOI: https://doi.org/10.1007/s10115-020-01531-7